Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis enhancement - better plural stemmer than minimal_english. #43248

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

markharwood
Copy link
Contributor

Drops the trailing “e” in taxes, dresses, watches, dishes etc that otherwise cause mismatches with plural and singular forms.
See the issue for recall benchmarking results on typical data.
Closes #42892

@markharwood markharwood self-assigned this Jun 14, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search


// *CHES - would be good to find a simple rule that solves lunches, churches but doesn't break aches
// documenting current behaviour here as a known issue:
assertAnalyzesTo(analyzer, "lunches", new String[]{"lunche"});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at http://www.thefreedictionary.com/words-that-end-in-ches I think that 'avalanche' and 'headache' are the odd ones out here - consider brooch/brooches, branch/branches, couch/couches. Maybe have a specific rule for -ache, and some exceptions?

Drops the trailing “e” in taxes, dresses, watches etc that otherwise cause mismatches with plural and singular forms

Closes elastic#42892
Made ies->y stemming stricter so short words match eg ties==tie
Removed special-case code for crazy-rare words iaes and eies
@jimczi
Copy link
Contributor

jimczi commented Apr 23, 2021

What's the status of this PR @markharwood ? Is it still relevant ?

@arteam arteam added v8.1.0 and removed v8.0.0 labels Jan 12, 2022
@mark-vieira mark-vieira added v8.2.0 and removed v8.1.0 labels Feb 2, 2022
@elasticsearchmachine elasticsearchmachine changed the base branch from master to main July 22, 2022 23:14
@mark-vieira mark-vieira added v8.5.0 and removed v8.4.0 labels Jul 27, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@csoulios csoulios added v8.6.0 and removed v8.5.0 labels Sep 21, 2022
@kingherc kingherc added v8.7.0 and removed v8.6.0 labels Nov 16, 2022
@rjernst rjernst added v8.8.0 and removed v8.7.0 labels Feb 8, 2023
@gmarouli gmarouli added v8.9.0 and removed v8.8.0 labels Apr 26, 2023
@quux00 quux00 added v8.11.0 and removed v8.10.0 labels Aug 16, 2023
@mattc58 mattc58 added v8.12.0 and removed v8.11.0 labels Oct 4, 2023
@javanna javanna added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 16, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

English-minimal analyzer has bad plural stemming